adapter network
Gradient-free Continual Learning
Continual learning (CL) presents a fundamental challenge in training neural networks on sequential tasks without experiencing catastrophic forgetting. Traditionally, the dominant approach in CL has been gradient-based optimization, where updates to the network parameters are performed using stochastic gradient descent (SGD) or its variants. However, a major limitation arises when previous data is no longer accessible, as is often assumed in CL settings. In such cases, there is no gradient information available for past data, leading to uncontrolled parameter changes and consequently severe forgetting of previously learned tasks. By shifting focus from data availability to gradient availability, this work opens up new avenues for addressing forgetting in CL. We explore the hypothesis that gradient-free optimization methods can provide a robust alternative to conventional gradient-based continual learning approaches. We discuss the theoretical underpinnings of such method, analyze their potential advantages and limitations, and present empirical evidence supporting their effectiveness. By reconsidering the fundamental cause of forgetting, this work aims to contribute a fresh perspective to the field of continual learning and inspire novel research directions.
Cross-Problem Learning for Solving Vehicle Routing Problems
Lin, Zhuoyi, Wu, Yaoxin, Zhou, Bangjian, Cao, Zhiguang, Song, Wen, Zhang, Yingqian, Jayavelu, Senthilnath
Among the studied COPs, the Vehicle Routing Problems (VRPs) are often favoured and chosen to verify the effectiveness Existing neural heuristics often train a deep architecture of the NCO methods, especially the Traveling from scratch for each specific vehicle Salesman Problem (TSP) and Capacitated Vehicle Routing routing problem (VRP), ignoring the transferable Problem (CVRP). On the one hand, VRPs are widely applied knowledge across different VRP variants. This paper in real-world scenarios such as logistics, and drone proposes the cross-problem learning to assist delivery [Wang and Sheu, 2019; Konstantakopoulos et al., heuristics training for different downstream VRP 2022]. On the other hand, VRPs are known to be NPcomplete variants. Particularly, we modularize neural architectures problems, and many of them are challenging to be for complex VRPs into 1) the backbone solved efficiently. With the advances of deep learning and its Transformer for tackling the travelling salesman power to automatically learn neural heuristics, NCO methods problem (TSP), and 2) the additional lightweight have demonstrated notable promise against traditional heuristics modules for processing problem-specific features [Kool et al., 2018; Kwon et al., 2020; Li et al., 2021; Luo in complex VRPs. Accordingly, we propose to pretrain et al., 2023]. To further strengthen NCO methods, a number the backbone Transformer for TSP, and then of recent endeavors have been paid to enhance generalization apply it in the process of fine-tuning the Transformer capabilities, which attempt to ameliorate the performance of models for each target VRP variant. On the the neural heuristics in solving the VRP instances with distributions one hand, we fully fine-tune the trained backbone or sizes unseen during training [Geisler et al., 2022; Transformer and problem-specific modules simultaneously.
TRAWL: External Knowledge-Enhanced Recommendation with LLM Assistance
Luo, Weiqing, Song, Chonggang, Yi, Lingling, Cheng, Gong
Combining semantic information with behavioral data is a crucial research area in recommender systems. A promising approach involves leveraging external knowledge to enrich behavioral-based recommender systems with abundant semantic information. However, this approach faces two primary challenges: denoising raw external knowledge and adapting semantic representations. To address these challenges, we propose an External Knowledge-Enhanced Recommendation method with LLM Assistance (TRAWL). This method utilizes large language models (LLMs) to extract relevant recommendation knowledge from raw external data and employs a contrastive learning strategy for adapter training. Experiments on public datasets and real-world online recommender systems validate the effectiveness of our approach.
Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting
Liu, Fangcheng, Tang, Yehui, Liu, Zhenhua, Ni, Yunsheng, Han, Kai, Wang, Yunhe
Speculative decoding has demonstrated its effectiveness in accelerating the inference of large language models while maintaining a consistent sampling distribution. However, the conventional approach of training a separate draft model to achieve a satisfactory token acceptance rate can be costly. Drawing inspiration from early exiting, we propose a novel self-speculative decoding framework \emph{Kangaroo}, which uses a fixed shallow sub-network as a self-draft model, with the remaining layers serving as the larger target model. We train a lightweight and efficient adapter module on top of the sub-network to bridge the gap between the sub-network and the full model's representation ability. It is noteworthy that the inference latency of the self-draft model may no longer be negligible compared to the large model, necessitating strategies to increase the token acceptance rate while minimizing the drafting steps of the small model. To address this challenge, we introduce an additional early exiting mechanism for generating draft tokens. Specifically, we halt the small model's subsequent prediction during the drafting phase once the confidence level for the current token falls below a certain threshold. Extensive experiments on the Spec-Bench demonstrate the effectiveness of Kangaroo. Under single-sequence verification, Kangaroo achieves speedups up to $1.68\times$ on Spec-Bench, outperforming Medusa-1 with 88.7\% fewer additional parameters (67M compared to 591M). The code for Kangaroo is available at https://github.com/Equationliu/Kangaroo.
FedSIS: Federated Split Learning with Intermediate Representation Sampling for Privacy-preserving Generalized Face Presentation Attack Detection
Alkhunaizi, Naif, Srivatsan, Koushik, Almalik, Faris, Almakky, Ibrahim, Nandakumar, Karthik
Lack of generalization to unseen domains/attacks is the Achilles heel of most face presentation attack detection (FacePAD) algorithms. Existing attempts to enhance the generalizability of FacePAD solutions assume that data from multiple source domains are available with a single entity to enable centralized training. In practice, data from different source domains may be collected by diverse entities, who are often unable to share their data due to legal and privacy constraints. While collaborative learning paradigms such as federated learning (FL) can overcome this problem, standard FL methods are ill-suited for domain generalization because they struggle to surmount the twin challenges of handling non-iid client data distributions during training and generalizing to unseen domains during inference. In this work, a novel framework called Federated Split learning with Intermediate representation Sampling (FedSIS) is introduced for privacy-preserving domain generalization. In FedSIS, a hybrid Vision Transformer (ViT) architecture is learned using a combination of FL and split learning to achieve robustness against statistical heterogeneity in the client data distributions without any sharing of raw data (thereby preserving privacy). To further improve generalization to unseen domains, a novel feature augmentation strategy called intermediate representation sampling is employed, and discriminative information from intermediate blocks of a ViT is distilled using a shared adapter network. The FedSIS approach has been evaluated on two well-known benchmarks for cross-domain FacePAD to demonstrate that it is possible to achieve state-of-the-art generalization performance without data sharing. Code: https://github.com/Naiftt/FedSIS
Linear Representation Meta-Reinforcement Learning for Instant Adaptation
Peng, Matt, Zhu, Banghua, Jiao, Jiantao
This paper introduces Fast Linearized Adaptive Policy (FLAP), a new meta-reinforcement learning (meta-RL) method that is able to extrapolate well to out-of-distribution tasks without the need to reuse data from training, and adapt almost instantaneously with the need of only a few samples during testing. FLAP builds upon the idea of learning a shared linear representation of the policy so that when adapting to a new task, it suffices to predict a set of linear weights. A separate adapter network is trained simultaneously with the policy such that during adaptation, we can directly use the adapter network to predict these linear weights instead of updating a meta-policy via gradient descent, such as in prior meta-RL methods like MAML, to obtain the new policy. The application of the separate feed-forward network not only speeds up the adaptation run-time significantly, but also generalizes extremely well to very different tasks that prior Meta-RL methods fail to generalize to. Experiments on standard continuous-control meta-RL benchmarks show FLAP presenting significantly stronger performance on out-of-distribution tasks with up to double the average return and up to 8X faster adaptation run-time speeds when compared to prior methods.